Overview

Dataset Statistics

Number of Variables 12
Number of Rows 89392
Missing Cells 0
Missing Cells (%) 0.0%
Duplicate Rows 0
Duplicate Rows (%) 0.0%
Total Size in Memory 41.0 MB
Average Row Size in Memory 480.5 B
Variable Types
  • Numerical: 3
  • Categorical: 9

Dataset Insights

id is uniformly distributed Uniform
id is skewed Skewed
claim_amount is skewed Skewed
cltv is skewed Skewed
area has constant length 5 Constant Length
marital_status has constant length 1 Constant Length
vintage has constant length 1 Constant Length
policy has constant length 1 Constant Length
claim_amount has 17671 (19.77%) zeros Zeros

Variables


id

numerical

Approximate Distinct Count 89392
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 1430272
Mean 44696.5
Minimum 1
Maximum 89392
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • id is uniformly distributed

Quantile Statistics

Minimum 1
5-th Percentile 4470.55
Q1 22348.75
Median 44696.5
Q3 67044.25
95-th Percentile 84922.45
Maximum 89392
Range 89391
IQR 44695.5

Descriptive Statistics

Mean 44696.5
Standard Deviation 25805.392
Variance 6.6592e+08
Sum 3.9955e+09
Skewness 0
Kurtosis -1.2
Coefficient of Variation 0.5773
  • id is not normally distributed (p-value 7.304856098197394e-06)

gender

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 6245838

Length

Mean 4.8702
Standard Deviation 0.9915
Median 4
Minimum 4
Maximum 6

Sample

1st row Male
2nd row Male
3rd row Male
4th row Female
5th row Male

Letter

Count 435358
Lowercase Letter 345966
Space Separator 0
Uppercase Letter 89392
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Male, Female) take over 50.0%

area

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 6257440
  • The largest value (Urban) is over 2.32 times larger than the second largest value (Rural)

Length

Mean 5
Standard Deviation 0
Median 5
Minimum 5
Maximum 5

Sample

1st row Urban
2nd row Rural
3rd row Urban
4th row Rural
5th row Urban

Letter

Count 446960
Lowercase Letter 357568
Space Separator 0
Uppercase Letter 89392
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Urban, Rural) take over 50.0%
  • The largest value (urban) is over 2.32 times larger than the second largest value (rural)
  • area has words of constant length

qualification

categorical

Approximate Distinct Count 3
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 6656865

Length

Mean 9.4682
Standard Deviation 1.6334
Median 11
Minimum 6
Maximum 11

Sample

1st row Bachelor
2nd row High School
3rd row Bachelor
4th row High School
5th row High School

Letter

Count 800138
Lowercase Letter 664499
Space Separator 46247
Uppercase Letter 135639
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (High School, Bachelor) take over 50.0%

income

categorical

Approximate Distinct Count 4
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 6417614
  • The largest value (5L-10L) is over 2.49 times larger than the second largest value (2L-5L)

Length

Mean 6.7918
Standard Deviation 2.68
Median 6
Minimum 4
Maximum 13

Sample

1st row 5L-10L
2nd row 5L-10L
3rd row 5L-10L
4th row 5L-10L
5th row More than 10L

Letter

Count 272578
Lowercase Letter 95648
Space Separator 27328
Uppercase Letter 176930
Dash Punctuation 73874
Decimal Number 229646
  • The top 2 categories (5L-10L, 2L-5L) take over 50.0%
  • The largest value (5l10l) is over 2.49 times larger than the second largest value (2l5l)

marital_status

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 5899872

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 1
2nd row 0
3rd row 1
4th row 0
5th row 1

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 89392
  • The top 2 categories (1, 0) take over 50.0%
  • marital_status has words of constant length

vintage

categorical

Approximate Distinct Count 9
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 5899872

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 5
2nd row 8
3rd row 8
4th row 7
5th row 6

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 89392
  • vintage has words of constant length

claim_amount

numerical

Approximate Distinct Count 10889
Approximate Unique (%) 12.2%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 1430272
Mean 4351.5024
Minimum 0
Maximum 31894
Zeros 17671
Zeros (%) 19.8%
Negatives 0
Negatives (%) 0.0%
  • claim_amount is skewed right (γ1 = 1.0442)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 2406
Median 4089
Q3 6094
95-th Percentile 10078
Maximum 31894
Range 31894
IQR 3688

Descriptive Statistics

Mean 4351.5024
Standard Deviation 3262.3598
Variance 1.0643e+07
Sum 3.8899e+08
Skewness 1.0442
Kurtosis 3.2313
Coefficient of Variation 0.7497
  • claim_amount is not normally distributed (p-value 1.5556125822466837e-10)
  • claim_amount has 2258 outliers

num_policies

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 6502502
  • The largest value (More than 1) is over 2.07 times larger than the second largest value (1)

Length

Mean 7.7414
Standard Deviation 4.687
Median 11
Minimum 1
Maximum 11

Sample

1st row More than 1
2nd row More than 1
3rd row More than 1
4th row More than 1
5th row More than 1

Letter

Count 482104
Lowercase Letter 421841
Space Separator 120526
Uppercase Letter 60263
Dash Punctuation 0
Decimal Number 89392
  • The top 2 categories (More than 1, 1) take over 50.0%

policy

categorical

Approximate Distinct Count 3
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 5899872
  • The largest value (A) is over 2.3 times larger than the second largest value (B)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row A
2nd row A
3rd row A
4th row A
5th row A

Letter

Count 89392
Lowercase Letter 0
Space Separator 0
Uppercase Letter 89392
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (A, B) take over 50.0%
  • The largest value (a) is over 2.3 times larger than the second largest value (b)
  • policy has words of constant length

type_of_policy

categorical

Approximate Distinct Count 3
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 6401252
  • The largest value (Platinum) is over 2.27 times larger than the second largest value (Silver)

Length

Mean 6.6088
Standard Deviation 1.6399
Median 8
Minimum 4
Maximum 8

Sample

1st row Platinum
2nd row Platinum
3rd row Platinum
4th row Platinum
5th row Gold

Letter

Count 590772
Lowercase Letter 501380
Space Separator 0
Uppercase Letter 89392
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Platinum, Silver) take over 50.0%
  • The largest value (platinum) is over 2.27 times larger than the second largest value (silver)

cltv

numerical

Approximate Distinct Count 18796
Approximate Unique (%) 21.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 1430272
Mean 97952.829
Minimum 24828
Maximum 724068
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • cltv is skewed right (γ1 = 2.753)

Quantile Statistics

Minimum 24828
5-th Percentile 31692
Q1 52836
Median 66396
Q3 103440
95-th Percentile 307265.4
Maximum 724068
Range 699240
IQR 50604

Descriptive Statistics

Mean 97952.829
Standard Deviation 90613.8148
Variance 8.2109e+09
Sum 8.7562e+09
Skewness 2.753
Kurtosis 8.3333
Coefficient of Variation 0.9251
  • cltv is not normally distributed (p-value 2.0282390357795675e-14)
  • cltv has 10223 outliers

Interactions

Correlations

Missing Values